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0 Polymorphic mesh network image processing system. 



0 Polyniorpllic mesh uses physical mesh connection to form twelve useful connection patterns for each of the 
processing elements (2) making up an Image processor of cellular automata under software control. Each 
processing element includes a limited mesh of interconnections to related processing elements. This provides 
for programmable choice of network configuration. The limited mesh of network interconnections is controlled by 
Information stored In a register within the affected processing element The Interconnection pattern controlled by 
this infonmation is invoked by programming* or by the combination of programming and process data, so as to 
configure the network of processing elements dynamk:ally in the desired mesh. Representative configurations 
are: 

string; mesh; tree; cut>e; pyramid. ^ 
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POLYMORPHIC MESH NETWORK IMAGE PROCESSING SYSTEM 



BACKGROUND OF THE INVENTION 
1. Reld of the Invention 

s 

This invention relates to an an^y processor made up of a network of processing elements, and more 
particularly relates to an array processing network in wWch each processing element in the an'ay of 
procesang elements is equipped with a program-accessible connectfon control mechanism with a control- 
lable Rmited mesh of interconnections to related processing elements, so as to provide a programmable 
to choice of network configuration. 



2. Description of the Prior Art 
75 The following publications are representative of the prior art 



Published Articles 

20 1, Stemberg. "Biomedical image processing," Computer, Jan. 1983. shows an array of cellular 

automata of Identical cells connected to their nearest neighbor for Iterative neighbortiood processing of 
digital images. A serpentine shift register serially configures neighborhood Inputs to the 3 x 3 neighbortiood. 

2. Tumey et al, "Recognizing Partially Occluded Parts," IEEE Transactions on Pattern Analysis and 
Machine Intelligence. July, 1985, pp.41C>-421. shows the use of a variety of techniques including Hough 

2S transfonm, and weighted template matching. 

3. Mudge et al, "Efficiency of Feature Dependent Algorithms for the Parallel Processing of Images." 
IEEE 0190-3918/83/0000/0369 1983. 369-373, shows how the architecture of an Image processing system 
can benefit from configuration as multiple subimage processors in which processing elements communicate 
through some form of communication networic. The authors explore the difference between feature- 
so dependent algorithms and feature-independent algorithms. 

4. Stemberg et al. "Industrial Morphotogy," shows the combination in a single system of image 
processing and of pattern recognition. 

5. D.E Shaw, "The NON-VON Supercomputer." Intemal report, Columbia Unh^ersity. Aug. 1982, 
shows a massively parallel system with an I/O switch in each processing element and with flag registers to 

35 activate and deactivate indhfidual PEs. 

6. MJ. KImmel, R.S. Jaffe, J.R. Mandeville, and MA Lavln. "MfTE* Morphic Image Transform 
Engine. An Architecture for Reconfigurable Pipelines of Neighborhood Processors. IBM RC11438, Oct 10, 
1985, shows a reconfigurable networic of processing elements capable of a variety of Interconnections of Pe 
to PE via bus connections under operator control. 

40 7. A.J. Kessler and J.H. Patel,. "Reconfigurable Parallel Pipelines for Fault Tolerance." IEEE. CH1813- 

5^82/0000/0118, 1982. shows reconfigurable pipeline connection for graceful degradation. 

8. S. R. Stemberg, "Parallel Architecture for Image Processing," IEEE, CHI 51 6-6/79/0000-071 2, 
1979. shows a PE networic with full connectivity. 

9. T. N. Mudge, E J. Delp. L J. Siegel and H. J. Siegel. "Image Coding Using tiie Multimicroproces- 
45 sor System PASM." IEEE. 82CH1 761 -6/82^)000/0200. 1982. shows processing element interconnection by 

an Interconnection network. 

10. S. R. Stemberg, "l^guage and Architecture for Parallel Image Processing." Pattern Recognition 
in Practice, North-Holland Publishing Co., 1980, shows a complex network of PEs and explains operation. 

50 

Patents • U. S. Patent 4,174,514. November 13, 1979. shows an anray processor with adjacent neighbor- 
hood processing elements interconnected for mutual access to tiie image data in overiap areas of two 
adjacent image slices. 

• U. S. Patent 4.215.401. CELLULAR DIGITAL ARRAY PROCESSOR, July 29. 1980. shows an array 
processor in which each processing element "cell" includes two accumulators arranged to connect that cell 
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(except those at the array edge) with its two neighboring cells along one axis and wrth its two neighboring 
cells along the orthogonal axis. 

• U. S. Patent 4,380.046. Fung. MASSIVELY PARALLEL PROCESSOR COMPUTER, April 12. 1983, shows 
an image processor which performs spatial translation by shifting or "sliding" of bits vertically or 

5 horizontally to neighboring processing elements. P register to P register, as permitted by another register, 
called G register. Each processing element includes an arithmetic, logic and routing unit (ALRU). an I/O unit 
and a local memory unit (LMU). The ALRU constitutes three functional components, a binary counter/shift 
register subunit, a logic-slider subunit (P register) and a mask subunit (G register). 

• U. S. Patent 4.398.176. Dargel et al. DATA ANALYZER VnTH COMMON DATA/INSTRUCTION BUS, 
10 August 9. 1983. shows an image processing array in which each processing element includes external 

command control fines to control whether bus information is to be used as Instruction or as decta. Each 
processing element also includes mechanism to determine from the bit structure of an instruction whether 
the instruction is local or glot)al, and to stop forward transmission if local. 

• U. S. Patent 4.601.055, Kent. IMAGE PROCESSOR. July 15. 1986. shows an iconic-to-lconic low level 
75 image processor with pixel-by-pixel forward transformation and with ptxel-by-pixel retrograde transformation. 

The prior art provides for permanent configuration of processing elements in a pipeline or other pattern 
Image processing system. The prior art provides for reconfiguration of an image processing system via 
20 switching networks and busses. The prior art provides for convenient bit transfer to adjacent processing 
elements. The prior art does not however, teach nor suggest the invention, which provides for very high 
speed switching wrthin the processing element programmable as a polymorphic mesh, so as to optimize 
dynamically the configuration to tiie data being processed in a configuration such as: string; mesh; tree; 
cuk>e; pyramid. 

25 Cellular automata has been found very useful for Image processing, computer vision and otiier 
computations in physics. All existing interconnection networics for cellular automata assume a fixed pattern 
such as a string, a mesh, a tree, a cube or a pyramid, ete.. Each pattern is good for certain types of 
computing but is poor for computations tiiat do not match ttie pattern. Since the networic interconnection 
pattern is built in, it is fixed; it can not be changed even when a mismatch is detected. The mismatch leads 

30 to poor efficiency. 

For example, an NxN mesh is an optimal interconnection for local operations in image processing, but 
its performance is poor in computing a global operation (e.g. It takes N cycles to compute MINIMUM, a 
linear complexity). On the otiier hand, a tree interconnection is optimal for computing MINIMUM fit takes 
only log N cycles, a k)garitiimic complexity) but is very inefficient in computing tiie local operations of an 

35 Image because of tiie lack of ttie neighborhood connection. 

Besides the general inefficiency in computing when tiie interconnection does not match the algorithm, 
the fixed-pattern approach is inflexible in designing an algorithm. This is mainly caused by the restriction in 
data flow. For example, in tfie string interconnection, the data flow is one direction only, from left to right 
Such a restriction confines the algorittim domain; only the algorittims that have a "string" data flow can be 

40 benefit from the "string" network. In this regard, the fixed networks are very special-purpose and have a 
very narrow range of applications. 

Anotiier disadvantage of tfie fixed interconnection pattern is tfiat is does not support efficientiy iconic 
and Intemnediate processing simuftaneously. Such disadvantage is spedfic to one important application of 
tiie cellular automata, computer vision, in which both iconic (or Image) processing and the tiwisformation 

45 from iconic to symbolfc Information (called intermediate level processing) are two integral parts. A serious 
implication of this disadvantage is the I/O problem, because tiie image after Iconic processing needs to be 
shipped outside of the network for further Intermediate processing. 



50 SUMMARY OF THE INVENTION 

The object of tiie invention is to provide program-accessible convenient high speed connectivity to each 
processing element in an may, so tiiat processing elements may be programmably grouped In an effective 
manner without Incumng the cost of the complex connections required for universal connectivity. 
55 Anottier object of the invention is to provide convenient programmable control of connectivity of each 
processing element in tiie array, so ttiat processing elemente may be programnnably regrouped from time 
to time, both under adaptive control by the computer as it senses the need for optimization regrouping and 
under operator control as tiie operator foresees the need for optimization regrouping. 



3 



0 2S7 581 



Another object of the invention is to provide an external memory data connection to the processing 
element by which certain hardware may be eliminated and additional flexibility of operation may be gained. 

A feature of the invention is the use of an array of polymorphic mesh processing elements* each of 
which has processing capability embodied in an arithmetic and logic unit with memory, and also has 
5 programmable connection control capability with geographic destination connections and also logical 

connections. & 

Another feature of the invention is a provision for programmable short-K^ircuit capability in the polymor- 
phic mesh processing element The short-circuit capability allows a series of intervening processing ^ 
elements to serve simply as wire equivalents in transmitting data from a sending PE to a remote PE without 
70 cycle delay. 

Another feature of the Invention is a "polymorphic-mesh network/ which is a composite network of a 
conventional mesh extemal to each PE and an interna! networit within each PE, to accommodate standard 
patterns and other new useful pattems through software control so that the connection can be matched to 
the computing. Through the "polymorphic" feature, the network can "reshape" Itself adaptively. to allow 
75 flexible algorithm design and to cover wider application spectra. 

A specific feature, related to computer vision, supports the intermediate level processing (iconic to 
symbolic transformation) by the polymorphic mesh, resulting In an efficient architecture while avoiding the 
serious I/O problem. 

Another specific feature of the invention Is the provision of flag registers in the processing elements for 
20 use in conditional operattons including reconfiguration for adaptive self-optimization and for fail-soft 
campability. 

Another feature of the Invention is the provision of a limited number of multi-bit pattern registers, each 
accessing a limited number of software selectable hardwired pattems. useful in cellular automata, which can 
be formed by the polymorphic mesh. These pattems include bus, several trees, cube and pyramid, each of 
25 which is optimal for a related type of computation, selectable by a crossbar switch In response to a bit 
pattem In a selected one of the pattern registers. 

An advantage of the invention is Its high throughput speed at relatively tow cost achieved by providing 
programmable limited connectivity within the processing element so as to permit optimization of an^y 
connection of processing elements both physically and electronically. 
30 Another advantage is that results of intermediate level processing may be used in conjunction with 
programming to provide adaptive reconfiguration efficiently as a joint function of programming and 
Intermediate results of processing. This means that intermediate data can be used to invoke an adaptive 
optimization regrouping function; adaptive self-optimization and fail-soft capabilities result 

Another advantage is the simplicity and two-dimensional aspect of the invention, which permits vast 
35 numbers of interconnectable processing elements to be an^ayed on a small number of chips. 

The foregoing and other objects, features artd advantages of the invention will be apparent from the 
more particular descriptkm of the prefenred embodiment of the Invention, as Illustrated in the accompanying 
drawings. 

40 

BRIEF DESCRIPTION OF THE DRAWINGS 

FIG. 1 is a system block diagram of an image processor made up of a network of polymorphic mesh 
processing elements, with one processing element shown In functional block diagram form. 
45 RG. 2 is a diagram of the switching capabiPity of the connection control mechanism CCM of a 

polymorphic mesh processing element 

RG. 3 is a functional block diagram of the connection control mec^ianism of a polymorphic mesh ^ 
processing element 

RG. 4 is a diagram showing formation of linear string arrays from polymorphic mesh processing e 
50 elements. 

RG. 5 is a diagram showing formation of row trees from polymorphic mesh processing elements. 
RG. 6 is a diagram showing an alternative view of row trees. 

FIGs 7-10 are simplified diagrams showing chip area comparison and communication distance 
improvement as a result of using polymorphic mesh processing elements. 
55 RG. 11 is a diagram showing formulation of reverse row trees from polymorphic mesh processing 

elements. 

RGs. 12-20 are diagrams showing representative choices of array configuration conveniently achiev- 
able by programmable interconnection of polymorphic mesh proces^ng elements. 
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FIG. 21 is a more detailed blocic diagram of an individual polymorphic mesh processing element 
according to a prefenred embodiment of the invention. 



5 DESCRIPTION OF A PREFERRED EMBODIIWIEf^ OF THE INVENTION 

RG. 1 shows a polymorphic mesh network image processing system made up of an MxM array 1 of 
processing elements 2 under control of host computer H 3. Each processing element has a limited set of 
connections; in the prefenred emtxxfiment there are four connections, one connection to each of its adjacent 

10 orthogonal (non-diagonal) neighbors. These orthogonal neighbors will be refen^ed to as Cartesian neighbors. 
These orthogonal connections are Identified for processing element 4 with the directional identificalions 
NESW. The function of these connections is to present the output of any processing element directly to a 
limited number of Its neighbors (four in the prefenred embodiment). Overall programming control and 
houselceeping control is by host computer 3, via bus 9. 

75 One processing element is shown in greater detail. Processing element 5 is shown expanded to provide 
drawing space for intemai organs ALU 6. MEM 7 and COM 8 and NESW connections. 

Each processing element (PE) in the prefenred embodiment is equipped with four Cartesian connec- 
tions. These Cartesian connections are designated NESW for convenience in discussion. They connect to 
the respectively adjacent PEs in the designated directions. This simple mesh of Cartesian connections 

20 would be capable, without CCM 8, of making connection to the adjacent PE, which is a very important 
capability In image processing. This simple mesh is susceptible to convenient very large scale integration 
(VLSI) manufacturing. 

Non-Cartesian PEs are not wired for direct connection. Diagonal connectkjn and remote connection are 
not available on wires or metallization. Such connections would make manufacturing much more difficult; 

25 bundles of wires would provide bulk distance with its inherent speed-of-llght delay. 

In the prefenred embodiment, non-Cartesian PEs are accessed via inten^ening Cartesian PFs through 
programmed control of their respective CCM 8 capability. The CCM 8 patlem of connectfon is such that the 
input is effectively short-circuited to the desired output Connection may follow the pattern of the chess rook 
(straight Cartesian with optional extension) or the chess knight with opttonal extension (Cartesian X, 

30 Cartesian Y » = remote off-Cartesian) but not the chess bishop (diagonal with optional extension). Complex 
routing patterns may be set up. Comptox routing to a non-Cartesian destination may be set up to pass 
through a great number of PEs without cycle delay. 

As an alternative to tiie simple Cartesian connections of the prefenred embodiment, the limited set of 
PE-PE interconnections might also include connections to diagonally adjacent PEs. Direct connections to 

35 highly remote PEs, however, are prohibitively complex, considering that such connections can be made 
under program control according to tills Invention by combinations of Cartesian or otiier simple connections. 

RG.2 is a diagram of tiie switching capability of tfie connection control mechanism CCM 8 as shown in 
FIG. 1. The essential function is as a switching network to connect any one of the connections NESW in the 
X crossbar witfi any one of the connections NESW in the Y crossbar, under control of ttte bit values in a 

40 pattern register. The prefenBd embodiment provWes for selection be tween two pattern registers, with the 
selection of pattern register controlled by a bit value in a pattern selection register. Note ttiat FIG. 2 Is 
diagrammatic and does not show details of actual hardware. In tfie prefen'ed embodiment not ail of ttie 
connections available In a 4x4 matrix are necessarily used although all are indeed available. Matrix 10 is 
shown as having Inputs 11 in tiie Y dimensfon witii connections to conductor S12 in the X dimension. As 

46 shown, tttis is set up to connect N connector 13 via connection electronics 14 and 15 and intersection 16 to 
E connection 17. Different connections to SWN connections 18, 19 and 20 may be alternatively or 
simultaneously activated. The control is by pattern register 21 bit values or by pattern register 22 bit values. 
Pattern register selection is by pattern register selection register 23. Pattern registers 21 and 22 are sixteen 
bits each as shown by the slash on tiie connection line with ttie 16. Connections to connection lines 24, 25 

50 and 26 are also available. Crossbar switch 10 makes any one or plurality of sixteen connections as 
controlled by tiie bit values in the selected one of pattern registers 21 or 22. 

Each processing element connects to one or more of its neighbors as directed by tiie bit values in one 
of its two pattern registers. Instantaneous change of connection can t>e made simply by switching control to 
ttie alternate pattem register. Selection of pattern registers Is by ttie bit value in a simple binary pattem 

55 selection register. For purposes of explanation, the pattem registers may be considered standard pattem 
register and alternate pattem register, respectively accessed by 1 or by 0 value in tiie binary pattem 
selection register. 
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RG. 3 is a functional block diagram of COM 8 RG. 1, used to implement the switching capability 
described in RG. 2 and also to canry out certain other direction and logical function controls. A simplified 
logical connective capability 30 is shown for AND. OR, XOR (Exclusive OR), and ANDALLBIT capability. 
Other capabilities are also available but as the complexity increases the cost increases. The connection 
5 control mechanism receives flag Inputs which are stored In first flag register F1 31 and second flag register 
F2 32. These flags may be passed fonvard and may also be used to alter the control. Shift register mask 33 
Is provided to present outputs SRM 34 TRUE and SRM 35 COMPLEMENT, both of which are used to 
select a limited subset of the SRX 37 and SRY 39 bit values to control logical connective box 30. 

The X register 36 and SRX shifter 37 function to control the logical connective and also the geometrical 

10 connection to a neighboring processing element. Similarly. Y register 38 and shift register Y SRY 39 
function to control geometric and logical selection in the Y dimension. Usually X register 36 and Y register 
38 contain the Cartesian coordinates of a processing element in the system. Shifters SRX 37 and SRY 39, 
in conjunction with SRM 33, are used to derive any contiguous bit group of X and Y. Detailed functions of 
RG. 3 are outlined by twelve examples, formulating twelve different patterns from the polymorphic-mesh. 

75 Each processing element is set up to carry out an arithmetical logical operation or logical transform 
operation on values presented to it or, altematively, to perform a no-op. In addition to the operation or no- 
op, the PE IS set to connect with selected neighboring processing elements. One such connection to 
neighbor processing elements Is to sen^e as a short circuit The short circuit connection is essentially 
instantaneous. Connection occurs at the speed of electricity (speed of light) rather than with a one-cycle 

20 delay as Is common with ordinary operations. Including no-op. It is thus possible to bypass several 
processing elements in order to make the desired connection to a processing element which is a non- 
adjacent Cartesian neighbor. It is possible to make a zigzag move, to make connection to a remote 
processing element which is neither in the same column nor in the same row. In this fashion, even though 
the array connections are without diagonals, connection can be made to adjacent diagonal or remote 

26 diagonal or off-diagonal remote processing elemente. 

White additional explanation will be made, it should be understood at this point that the polymorphic 
mesh image processing system, properly program controlled, can be configured to cany out a processing 
effort involving a complex interconnection of processing elements. Each processing element does its 
appropriate arithmetic or logical operation or transform operation or no-op, with respect to information 

30 provided to ft, or the processing element may be short-circuited so as to be bypassed without cycle delay. 
A certain amount of sophistication in connections and In connection logic might be available as a 
function of tiie number of bit values in the pattern registers, and as a function of the number of pattern 
registers, of the connection control mechanism. But such sophistication would be costiy, because of the 
large number of replications required by the large number of processing elements. In order to keep costs of 

35 tiie processing element down, costs being measured in tenms not only of money but of complexity and patii 
length, the preferred embodiment has a limited repertoire of optimized connections. This repertoire is 
depicted in RGs 4 through 20. 

RG. 4 shows the formulation of linear arrays. There is a west to east linear array 41. and there is a 
nortii to south linear anBy 42. FIGs 5 and 6 show two techniques ioir forrhing row trees. In tiie first 

40 technkfue. X strings 45^ 46. 47 and 48 tiirough 51 are shown in RG. 5. These X strings of row trees are of 
different lengths. RG. 6 shows a tree in its more traditional presentation as tree con nection 52 In this 
traditional tree configuration seven inputs filter out to a single processing element in four stops. 

RG. 7-10 show a chip area comparison and communication distance Improvement as a result of using 
polymorphic mesh processing elements. The chip area for polymorphic, mesh 61 as shown in FIG. 7 is 

45 quite compact as contrasted to pattern 62 of an orthogonal tree, Rg. 9. The average communication 
distance in a 3x3 window, as shown in RG. 8 pattern 63, is 1.5 using the polymorphic mesh. The same 3x3 
window 64 in RG. 10 has an average communication distance of 2.125. 

RG. 11 IS a diagram showing formulation of reverse row trees from polymorphic mesh processing 
elemente. This is very similar to row trees shown in RG. 5. While a row tree is used to collect infonmation 

50 from fts leaves, a reverse row tree is to distribute intormation to its leaves. 

RGS. 12-20 show selected choices of array configuration convenientiy achievable by programmable 
interconnection of polymorphic mesh processing elements. Certain ones of the processing elements in the 
figures are shown as squared circles, to indicate processing operations, as contrasted to intervening 
processing elements used simply for pass-through, which are shown as simple circles. Note that each PE 

55 can perform both the pass-through function and a processing operation, as programmed. Each pattem is 
reduced to a set of sb(teen bit values and presented via tiie appropriate pattem register. 
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RG. 21 is a detail of the preferred embodiment polymorphic mesh processing element according to the 
invention. Note that much of the mechanism shown is relatively standard. This relatively standard hardware 
is shown outside the broken line box 93. The standard hardware features ALU 94 which provides outputs 1 
and 2 as well as appropriate input multiplexes. Inputs may come via instruction terminal 94 and also from 

5 registers NSE and W 95 which in tum are fed from memory 1 M1 , memory 2 M2. ALU output 1 or output 2. 
The local memory at 101 is available for appropriate purposes of computation and housekeeping related to 
the ALU. The local memory 101 can be extended to the external memory whose content is fed into the 
processing element via an external memory data wire EMD 102. The local and external memory content are 
multiplexed via multiplexer 103 and connected to Ml or^ M2. Output signals out 1 and out 2 are fed 

10 back to the same processing element 93 and may be also fed to other processing element as selected by 
the COM. 

The EMD connection and plurality of buses Ml and M2 provide for an external memory data connection 
to the internal memory means, processing means and connection control mechanism, whereby the 
processing element may be operating with local memory and external memory simultaneously. 
75 The connection at EMD 102 is important in that it permits direct connection of the individual PE to an 
. extemal memory which may be in the host H 3 or may be a standalone memory not shown. This EMD 
connection, not present in ordinary processing element in an-ay processors, makes it possible to provide the 
equivalent of FIG. 3 from an external memory. The EMD connection also makes possible a supplement to 
tile hardware of FIG. 3. as well as great flexibility of operation and setup of polymorphic-mesh processing 
20 element networks. 

The switching capability implemented in RG. 3 via its directional and logic function control is related to 
extemal menrwry and EMD 102. A wide spectrum of implementation means for RG. 3 Is possible, ranging 
from having all functions illustrated in FIG. 3 resident in each and every processing element to having all 
tiiese functions resident extemal to all PEs but delivering tiie resultant condition via connection EMD 102 to 

25 a pattem register selection register Rp 99. 

Two pattem registers, PRO 97 and PR1 98. allow instantaneous switching from one connection pattem 
to the other without loss of any insti^ction cycle. The Instantaneous switching is controlled by a one-bit 
register, tiie pattem register selection register (Rp 99). When it is detemiined that the values in one of tiie 
two pattem registers are no longer necessary for use. tiiat pattem register can be loaded with a new 

30 pattem. Such loading can be free, tiutt Is, can be canied out at the same time as the concunrent ALU 
operation. It must be remembered that tine processing elements in image processing systems are nonnally 
one-bit processing elements, and in any case are relatively simple. It is a relatively significant effort to load 
the pattem registers 97 and 98, which are 16 bits each. Pattem register selectton register 99 must also be 
loaded, but tills is a single bit 

35 There are occasions when it is necessary to ctose down processing to reload pattem registers 97 and 
98. If tills occurs It woukJ normally take 32 cycles to load the pattem registers and a 33rd cycle to toad ttie 
pattem register selection register 99. In many cases, however, there need be no time used specifically for 
loading the pattem registers. In ttie case of appropriate instructions, known to be appropriate, the operator 
can toad one bit into tfie pattem register 97 or one Wt Into tfie pattem register 98 or one bit into the pattem 

40 register selection register 99. During the cyde in which ALU 96 is canving out an arithmetic or togical 
operation over a period of time during which the arith-metic and logical unit 96 Is operating at full capacity, 
it tiius maybe possible to reload one or all of the registers in CCM. Such freeloading is by a path from 
instruction in set 94 via multtptexes in ALU 96 and feedback from out 1 or out 2. assuming appropriate 
setup of instruction gate 100 to carry out the loading function. 

45 fNtote that the selected sixteen bit values from pattem registers 97 or 98 are to control the detailed 
setting of the 4x4 crossbar switch 104. Its detail can be referred to RG. 2. 

In a typical operation ttie processing element will be eittier short circuited or active. If short circuited, a 
patterrt register such as register 97, set for short circuit connection, takes over and carries out tiie short 
circuit connection via that processing element to one or more other processing elements. In the case when 

50 the processing element is active pattem register selection register 99 wouW be set for action and would 
switch control from standard pattem register 97. set for bypass via short circuit to altemate pattem register 
98. set for directing inputs and outputs for ttie activity. 

These activities will be explained in tiie following paragraphs. 

The polymorphic mesh, shown in RGS. 1-3, is capable of a number of pattems limited by the 
55 complexity of It connection control mechanism CCM. The CCM pattem repertoire, being tiie union of ttiese 
pattems, is optimal for the union of the computing types covered separately. 



7 



0 257 581 



One control algorithm Is described In the Invention for each pattern respecttveiy. All control algorithms 
are simple to implement and. most importantly use the same set of hardware to generate the desired pattern 
on-line. 

A hardware mechanism is depicted in the invention to carry out the formation of all patterns in a 
5 systematical and consistent way. The mechanism Is simple to implement and very suitable for VLSI 
implementation. 

As a most distinguished feature of the polymorphic-mesh, nmny algorithms of linear complexity (0(N)) 
are reduced to logarithm complexity (0(log N)), a theoretical optimal. Therefore the N/logN speedup is 
gained by tfie architectural novelty; for a network of 1024x1024, the speedup is 100. 
70 Specific to computer vision, after the iconic processing by one pattern (mesh), the resulting image is 
not shipped outside of the network. Rather, it is further processed by another pattern (e.g. tree) to transform 
the iconic infbrmation to symbolic information (e.g. how many pbeels whose values are greater than 133?). 
Because of the polymorphic capabiFity, the data do not have to be output, consequently the I/O rate Is 
significantly reduced (e.g. five orders of magnitude reduction for the "how many" example in a 1024x1024 
75 Image). The saving from the I/O reduction contributes to the speedup on top of the speedup due to 
computing in a compound manner. 

Another patlem, Diagonal-Span^ee, can be formed by the polymorphic-mesh to facilitate the comput- 
ing of Ax + By + 0 in logarithm time where A,B and C are constant and (x, y) Is the coordinate of a pixel. 
This capatMlity is useful in both computer graphics and computer vision. For computer graphics, it is useful 
20 in display convex polygon, in creating shadow, in clipping, in drawing spheres, in computing adaptive 
histogram equalization, in texture mapping and anti-aliasing. For computer vision, such a capability is useful 
In generating a line mask, a band mask and a polygonal mask. It is also useful in computing Fast Hough 
Transform and Its inverse for detecting lines in a noisy image and other applications. 

The preferred embodiment includes a hardware mechanism to generate twelve useful cellular automata 
26 patterns and twelve control algoritiims, one for each pattern, to reshape tiie polymorphic-mesh Into the 
corresponding pattern under software control. The following sections describe tfie polymorphic hardware 
mechanism and tiie control algoritiims. 

One noticeable featijre of polymorphic mesh is the capability to "reshape* itself adaptive to the 
condition of tiie processing. As described previously, the pattern registers 97 and 98 can be loaded 
90 concurrentiy wWi tiie ALU operation; as a result, it can be conskfered tiiat each PE has P patterns 
(P1,....,Pp) at its disposition. 

The reshaping that is adaptive to the condition 0 of the processing allows each PE to assume a pattern 
Pi as a function G. Initially, all PEs start from a pattern, say PI, and processing begins under the initial 
pattem. As processing goes, each PE senses its local condition (this can be a test of convergence of a 
35 value, for example) and decides whetiier to stay in ttte current or replace it witii a new pattem. 

A condition can be global, meaning that the host can sense each PE condition collectively, tfien decide 
tiie choice of pattem and feed back to all PEs. A condition can also be local. The new pattem may be fed 
to an individual PE, or to a group of PEs as appropriate. 

Choice of a new pattern can be made in several ways. 

40 

(1) prescheduled: 

a sequence of patterns (e.g. Pi P2 Pp) are scheduled and will be adopted In such a priority. When 

45 ttie need of a new pattem Is detected, (while using PI) tiie next pattem P{i + l) will be used. Then P(i + 
2) will be loaded when possible to tiie unused pattem register concurrent witii the operation. 



(2) function of C: 

50 



55 

An example to show the adaptive reshaping is as follows: 
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A pattern of 3x3 window is established as Pi to allow filtering operation and an index is identified as 
condition C to measure the effectiveness of the filtering. The goal here is to increase the window size when 
C is not satisfactory. In this regard, we establish P2 as a 5x5 window, P3 as 7x7, P4 as 9x9 etc. 

Another example of adaptive reshaping is the "soft fail" application. It is common to design a condition 
5 C in each PE such that a malfunction reflected. Such a condition then can be used to decide a new suitable 
connection pattern Pi. 



POLYMORPHIC'MESH MECHANISM 

10 

The polymorphic mesh (Rgure 1) consists of an array 1 of MxM processing elements (PE). Each PE 
has four physical wires communicating with four non-diagonal neighbors, one for each neighbor. These 
communicating wires are denoted as E. W, S and N as shown for PE 4. PE 5 is shown expanded to show 
arithmetic and and logical unit (ALU) 6, memory (M) 7. and connection control mechanism (CCM) 8, ALU 6 

75 and 7 are relatively standard features in image processor processing elements. The connection control 
mechanism Is not standard, but rather is in the special configuration according to this Invention. 

As shown in RGs 1 and 2, each PE consists of three functional blocks: the connection control 
mechanism 8. the memory block and the Arithmetic Logic Unit (ALU) btock 6. The connection control 
mechanism 8 takes four wires (E, W. N and S) ALU output and memory (M) as inputs and reroute them as 

20 outputs. The routing is accomplished by •SHORT^CIRCUITing" any Input A (for example. 10 In In FIG. 2) 
to any output B. The signal appears on wire A is logically as that of wire B where A and B can be any of 
the inputs to the connection control unit. For example. "SHORT^WE" will togically equalize wire W 24 and 
wire E 26. 

RG. 3 shows the functional units of CCM 8. The action " SHORT_CIRCUIT" is conditional and the 
25 conditions are created by the connection control mechanism shown In Rgure 3. Each PE has two 1-blt flags 
F1 31 and F2 32 for generating condition signals. Each PE is equipped with one shift register SRM 33 
which can shift in both direction logically or arithmetically and supply true and complement outputs 34 and 
35. 

Each PE contains a pair of registers X 36 and Y 37 where register X holds the PEs row position (QSX 
30 i M-1) and register Y holds the PE 's column position {OSYS M-1). Botfi register X and Y have a shift 
register SRX 38 and SRY 39. each of which can be loaded the content of X and Y respectively and can shift 
In botii directions logically or aritiimetically. The bit shifted out of SRX is BSRX and ttiat of SRY is BSRY. 

Several functions are provided to generate tt>e condition on which ttie "SHORT_CIRCUIT" action Is 
based. 

36 LOAD reg value: tills function will load tiie "value" into SRM. Fl or F2 via instruction or memory; 
COPYSR reg: tiiis function will copy reg X to reg SRX or Y to SRY or botii 
AND/OR/XOR reg: this function performs one of ttie following. 
-AND/ORO<OR" X witii SRX and X witii SRM (or inverted SRM); 
-AND/OROCOR" Y witfi SRY and Y witii SRM (or inverted SRM); « 
40 botii of above; 

ANDALLBIT reg: tiiis function perfomns "AND" on all bits of reg; it produces one condition bit XANDALL If 
reg is X. or one condition YANDALL if reg is Y. or botti condition bits. 

The "SHORT_CIRCUIT" action is ttien based on the combination of BSRX, BSRY. XANDALL. YANDALL, 
Fl and F2. 

45 The remaining two functional blocks of the polymorphlc-mesh are rattier similar to ttie conventional 
design. The memory bk)ck can be viewed as a storage ttiat can deliver one bit to tiie connection control 
block and/or accept one bit from It per machine cycle. The ALU. afthough is similar to ttie conventional 
design, features on Its "conditional" response to the combination of bits BSRX, BSRY. XANDALL and 
YANDALL in choosing a "SEND" or a "RECEIVE" action atong witti tfie "SHORT-CIRCUIT" action. 

50 With tiie polymorphic-mesh mechanism, tweh^e pattems formed by the polymorphic-mesh are de- 
scribed in tiie foltowlng. Their conresponding control algorfthms are described In order. 
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CONTROL ALGORITHMS 
(PI) Linear Array 

5 As shown in Figure 4, a row linear anray of length MxM and a column linear anray of the same length 
can be formed from an MxM polymorphic mesh by connecting S of PE (M-1, I) to N of PE (0. i + and 
connecting E of PE (i, M-1) to W of PE (i + 1r 0). The N of PE (0,0) and S of PE (M-1, M-1) are the 
beginning and the end of the column linear anray respectively while the W of PE (0, 0) and E of PE (M-1, 
M-1) are the beginning and the end of the row linear array. 

70 The control algorithm is as follows. At per PE cyde, 



UNEAR 0 
IB { 

MEM = W; /* action IV 
E xMEM; /• action 2*/ 

20 

MEM « N; /* action 3*/ 
S = MEM; /* action 4*/ 

25 } 



Action (1) takes the datum on W into MEM at the end of the cycle while action (2) puts the content of MEM 
(at the beginning of the cycle) on E. In combination, actions (1) and (2) create the row an^y. The data 
injected through W of PE (0. 0) will march eastbound and after M cycles, the first row will be filled. After 
another M cycles, the data in the first row will march to the second row while the new data will fill the first 
row. Actions (3) and (4) create a column linear an-ay in a similar fashion. 

The formulation of the linear an-ay is unconditional. All PEs are taking the same action. The mechanism 
in the connection control mechanism is not utilized. 



40 
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(P2) Row Tree 

M trees (one per row) can be formed fronr^ the polymorphic mesh by the following control algorithm. 

5 

ROW-TREE 0 
{ 

intt; 7*1 is time step*/ 

int pidO; /* column position of a PE, Process IDOV 
int M, logM; /*M is the side size of the mesh and logMslog M*/ 
int treemasksl;/* a flag to construct the tree^/ 

20 

for (t=0; t< logM; t++) { 
if(-itreemaslc) 

25 

{SHORT_WE; DISABLE;} 
if(treemaslc -^pidO<t>) 
30 E MEM; 

if(treemaslc d& pidO<t>) 
35 MEM = W; 

treemask = trecmask & pidO<t>; 



40 



} 



^ Rgure 5 illustrates the at)ove control algorithm for a 8-PE case. At t=0, the *treemask"s for all PEs are 
1, therefore, every PE is enabled and the "even PEs" send data to "odd PEs". This is indicated by an 
anrow between every pair of "odd/even" PEs. This also forms the bottom level of the tree. 

At t=1. by investigating treemasi<, only the PEs with pidO<0> = 1 (lowest bit of pidO) are enabled. The 
disabled PEs are not shown by circle but they do establish the connection between PE 1 and 3. and PE 5 
^ and 7 by the action SHORT_WE. This forms the second level of the tree. 

The highest level of the tree is fonned by the control at t=2. At this step, only PE 3 and PE 7 are 
enabled; the rest PEs establish the connection by "SHORT" the W-E path but do not perform any operation 
(or equivalently do not change any state of MEM). 

An alternative view of this control algorithm is shown in Figure 6 with the nodes at each level of the tree 
^ mariced by theprocessor identification (pid) of the PEs. 
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The tree pattern is extremely useful in the paradigm of divide-and^nquer. Dedicated treenTiachines 
have been built for special-purpose computing. The complexity of the algorithms in this paradigm is usually 
0(log N) where N is the size of the input data. The same algorithms usually need 0(N) execution time in a 
mesh; this represents a 1024:10 speedup for N = 1024. Important algorithms In this category include MAX, 
5 MIN, k-th largest median, some/none etc. 

Using the mechanism in the connection control block, the control algorithm uses register X for pidO. 
register SRX and flag F1 for "treemask". The content of X register is copied to the SRX so that pid<t> brt 
is shifted out at time step t in BSRX then ANDed with "treemask" to derive the final condition. 



70 



(P3) Column Tree 



Similar to Row Tree, M column trees can be formed by the polymorphic mesh with the row position of 
the PEs, pidi, as the control and N-S path as the connection. The control algorithm is as follows. 

rs 

COLUMN-TREE () 

int t; /* t is time stcpV 

int pidl; /• row position of a PEV 

25 

Int M, logM; /«M is the side size of the mesh and logM^log M*/ 



Int treexnasksl;/« a flag to construct the tree*/ 

30 



for (t=0;t< logM;t++){ 



If (-.treemask) 

{SHORT_NS; DISABLE;} 
if(treema$k && -.pldl<t>) 

S = MEM; 
if (treemask pidO<t>) 



MEM » N; 
treemask « treemask Sc pidO<t>; 

} 



Column Trees are useful In transporting the data distributed column-wise in the mesh and converting 
algorithms with linear complexity (0(N)) to logarithmfc complexity (0(log N)) as discussed In Row Tree 
section. 

Similar to ROW_TREE. the COLUMN_TREE control algorithm uses raster Y for pid1, SRY to copy 
Y. and F2 to hoW "treemask". The condition to SHORTENS is based on the ANDing of F2 and BSRY 
which produces pid1<t> at time step t 
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(P4) Orthogonal Tree 

Orthogonal Tree (FIG. 9) is a useful network for sorting, matrix operations, minimunr» spanning tree. FFT 
and other graph algorithms. It can t)e formed from the polymorphic mesh by combining the Row Trees and 
the Column Trees by the ORTH_TREE control algorithm below. 



ORTH_TREE 0 
I 

int t; /* t is time step*/ 

int pidO; /* column position of a PE, Process IDO*/ 
int pidl; /♦ row position of a PEV 

int M, logM; /*M is the side size of the mesh and logM=Iog M*/ 
int hmaskrsl, vmasJcsl;/* flags to construct the tree*/ 

for(t=0; t< logM; t++) { 
/♦cycle IV 

{SHORT_\VE; DISABLE;} 
if(hniaslL && -.pidO<t>) 

E = MEM; 
if{hmask && pidO<t>) 

MEM = \V; 
hmask s bmask & pidO<t>; 

/♦cycle 2V 
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if(^viiia$k) 

{SHORTENS; DISABLE;} 

5 

if(vinask && •«pidl<t» 

S s MEM; 
if(vinask pidl<t>) 

MEM = N; 
vmask =s vmask & pidl<t>; 

} 

] 

20 

Key advantages of forming the Orthogonal Tree from the polymorphic mesh are 

(1) reduction of chip area: the chip area required to layout the mesh and the orthogormi tree are OCN*^) and 
0((rr2)*(!ogNr2) respectively where N Is the side size of the mesh and the number of leafs of the 
orthogonal tree. This represent a saving at a factor of (log N)"^. For N = 1024. the chip area used by the 

25 polymorphic mesh is 1/100 of the orthogonal tree. 

(2) efficient neighborhood operations: PEs in Orthogonal Tree does not connected to its geographical 
nearest neighbors hence for image processing, many important neighborhood operations can not be 
performed effidently because there are no direct communication paths. In fact, more than half of the data in 
a 3x3 window must be passed up one level in the tree then passed down to the center of the window; the 

90 average distance among data in a 3x3 window is 2.125 as against 1.5 in the polymorphic mesh. A sample 
3x3 window for the orthogonal tree is shown in Figure 8 and 10 and the number in the circle is the distance 
between that datum and the centre of the window. 

As summarized in Rgure 7 for N=4. between the polymorphic mesh and the orthogonal tree, the ratio 
of chip area is 16:46 and that of average distance In a 3x3 window is 1.5:2.1 (Rgure 8 and 10). 

35 The control algorithm ORTH^JTREE uses register X for pidO, SRX to copy pidO, F1 for hmask, and 
produces pidO<t> at time step tin BSRX. Symmetrically, the control algorithm uses register Y for pidl , SRY 
to copy pidl, F2 for vmask. and produces pid1<t> at time step t in BSRY. The conditional SHORT^WE 
and SHORTENS are based on BSRX. BSRY, F1 and F2. 

40 

(P5) Reverse'Row Tree 

RR-Tree Is a top-down tree (as against row trees which are bottom-up) which can be fomied from the 
top level to the bottom level of the tree, a reverse process of the row tree. The control algorithm is shown 
45 below. 
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RR-TREEO 
{ 

imt;/*tistimesicp*/ 

Int pidO; column position of a P£*/ 

int M, logM; is the side size of the mesh and logM=3log M»/ 
int treemask=sM%2;/* a flag to construct the tree*/ 
lot mask; /*an intermediate conditionV 



for (t=:0; t< logM; t++) { 

20 

mask =s ANDALLBIT (-.treemask | pidO ); 
if(^ma5k) 

25 {SHORT^WE; DISABLE;} 

lf(mask Sl& pidO<logM-t-l» 
W s MEM; 

30 

if(mask -.pidO<logM-t-l>) 
MEM»E; 

35 treemask s ASHIFT ( treemask^ 1); 

I 

I 



As shown in Rgure 11 by an 8-PE example, the control algorithm can be described as follows. The flag 
treemask was initiafized as one half of the total number of PEs (Le. 4 = 100). The INVERTed "treemask" Is 
first "ORed* with pidO then the result is passed to ANDALLBIT which retums a '1' in 'mask' if all bits of the 
45 result are 'V and retums a '0' othenvise. the PEs with mask = 1 (i.e. PE 3 and 7) are part of the tree and the 
rest being not a tree node, will disable themselves and SHORT their W-E path to establish the tree 
connection. For PE 3 and 7, bit 2 of pidO is further checked, a M* in this bit lets PE 7 send datum to the 
receiver PE 3 whose bit 2 Is '0'. This forms the top level of the tree at t=0. 

At the end of t=0, treemask is shifted arithmetically one bit to the right It therefore becomes 110 fbr 
so the next time step. 

At t= 1. the same process identifies that PE 1.3,5 and 7 are the tree nodes, and PE 7 sends data to PE 
Sand PE3to PE 1. Treemask becomes 111 at the end of t=1. 

At t=2. every PE is a tree node and each PE with odd pWO sends data to its neighbor with lower even 
pidO. 

55 Using the mechanism in connection control btock. treemask" Is loaded to SRM, and X copied to SRX 
Register X is "ORed" with the INVERTed SRM first the result is then -ANDAaerTed" to produce "mask". 
SRX is shift left logically to produce pidO<logM-t-l> in BSRX at time step t This is used to control the 
SEND/RECEIVE action for a pair of the tree nodes. 
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The RR-Tree is mainly for propagating a datum to all tree nodes which will perform different operations 
to this datum depending on their positions in the tree. For computer graphics, this pattern is useful because 
it allows each PE to generate A'X simultaneously where A is a constant and X is pidO. Evaluating A*X in 
parallel allows the fast generation of a fine: this will be further discussed in another pattern called Diagonal- 
5 Span-Tree (Pi 2). 

In general, a reverse tree is used to convert a symbolic representation In a parameter space to an 
iconic representation in Image space, then the algorithm is performed tconically In a massive parallelism 
available in the polymorphic mesh. 

Although the control algorithm uses the PE with the highest pfdO as the root of the tree, the PE with the 
70 lowest pidO can be used as the root as well and the control algorithm is of equivalent complexity. 



(P6) Reverse Column Trees (ROTree) 

IS Similar to RR-Tree. the RC-tree can be formed by using pid1 as the control and N-S as the path to 
establish the tree connection. This is shown in the following control algorithm. 



RC-TREE 0 
{ 

25 

int t; /• I is time step*/ 
int pidl; /• row posiUon of a PEV 
^ int M, logM; /♦M is the side size of the mesh and logM=slog MV 

int trcemask=sM%2;/* a flag to construct the tree*/ 
int mask; /*zn intermediate condition*/ 

35 

for(t=0; t<logM;t++){ 

mask s ANDALLBIT (-.treemask I pidl ); 
iff^mask) 

{SHORT_NS; DISABLE;} 
if(mask && pidl<logM-t-I>) 

N = MEM; 
iffmask -»pidl<iogM-t-l» 



4$ 
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MEM » S: 
treemask = ASHIFT ( treemask, 1); 

5 

} 

} 

70 

The properties of the RC-Trees are the same RR-Trees ©ccept that RC-Trees are related to the data in 
the column of the mesh. 



(P7) Row-Bus 

For broadcasting purpose, a bus is a very useful pattern whose broadcasting distance is the shortest 
One bus can be formed for every row of the polymorphic mesh by tfie following control algorithm. 

20 

ROW_BUS 0 
{ 

25 

jnt sender; /* ID for the sender*/ 
int pidO; 



30 

SHORT_WE; 
if (pidO =:s sender) 

35 



E = MEM; 

40 else 

MEM = W; 

} 



A PE in a row is designated as the "sender" and the rest of the PEs are the receivers. All PEs 
"SHORT" their E-W path to establish the bus. and the sender will send the data to E (or W) while the 
receivers can receive the data from W (or E). (In another case when a datum is injected Into W of E by the 
50 external controller, there is no "sender" PE and ail PEs are the "receivers".) 

Using the mechanism provided by the connection control block, "sender" is loaded to SRM. The 
"INVERTed" SRM is "XORed" with register X which stores pidO; the resulting bits are "ANDALLBITed". A 
•r In XANDALL identifies the PE as a sender and those PEs with XANDALL=0 are the receWers. 
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(Pd) Column Bus 

Similar to Row Bus, a Column Bus can be formed for each column of the polymorphic mesh by using 
pidi as the control and N-S as. the path as shown in the following control algorithm. 

5 

COLUMN_BUS 0 
{ 

10 

int sender; /• ID for the sender*/ 
int pidl; 

76 



SHORTENS; 
if (pidl ssss sender) 
S = MEM; 



MEM»N; 

1 



The property of the column bus is the same as the row bus. 

In combination, the row bus and the column bus can be used to broadcast a common datum to all PEs 
in the mesh in two steps. At the first step, the common datum can be broadcasted to all PEs in the top row; 
then at the second step» the PE at the top row can broadcast the common datum to all other PEs along 
column direction. 



(P9) Pyramid 

Pyramid configuration is powerful in image processing and computer vision mainly because of its 
capability in handling multi-resolution Images. This pattem can be formed from the mesh by the following 
control algorithm: 
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PYRAMID 0 

{ 

int t; /* t is time step*/ 

int pIdO; /* column position of a P£*/ 

int pidl; /* row position*/ 

int M, logM; /*M is the side size of the mesh and logMslog M*/ 
int hmasksl, rmasksl; /*two flags to construct the pyramid*/ 

for (t=0; t< logM; t++) { 
/♦cycle 1 action*/ 

if(-.hmask | -ivmask) 

{SHORT_WE; SHORTENS; DISABLE;} 
if(hmask vmask &<Sc ^pidO<t> &Sl -ipidl<t>) 
E » MEM; 

iffhmask vmask pidO<t> && -.pidl<t» 

{N = MEM; MEMl « W;} 
iffhmask vmask -^pidO<t> St& pidl<t>) 

E = MEM; 

iffhmask &8t vmask StSt pidO<t> Stii pidl<t>) 
{MEMO = N; MEMl » W;} 
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/•cycle 2 action^/ 

if(-.hnia$k | -tvmask) 

5 

{SHORT_WE; SHORTENS; DISABLE;} 
ifQimask && vmask && ->pidO<t> && -.ptdl<t>) 
10 N0_ACnON; 

if(hma$k vmask && pidO<t> && -ipidl<t>) 

S » MEMl; 
ifQimask '&& vmask && ^iddO<t> && pldl<t>) 

NO_ACnON; 

^ ir(hmask && vmask && pidO<t> && pidl<t>) 

MEMl = N; 

25 

hmask s hmask Sc pidO<t>; 
vmask = vmask & pidl<l>; 



} 

^ The control algorithm consists of log M steps and within each step there are two control cycles. In 
another word, each step forms a level of the pyramid in two PE cycles. 

Rgure 12. 13 and 14 depict the pyramid control algorithm by a sample 8x8 mesh. Two masks, hmask 
(for row) and vmask (for column), are initialized as '1' such that all PEs in the mesh are 'en abled' at the 
first time step. At t=0, all PEs are active and every 2x2 PEs are fonmed as a group. These four (2x2) PEs 

^ are the NW, NE, SW and SE sons of the pyramid and the parent is the same as the SE son. The activity of 
the four sons are distinguished by the pldO<t> and pid1<t> bits. The SE son (or the parent) being 
designated as pidO<0> = pid1<0> = 1. will receive data from the SW (pidO<0>=0 and pid1<0>=1) and NE 
(pidO<0> = 1 and pidKO =0) sons at the first cycle. In this cycle, the NW son enroutes its data to the NE 
son; this data will be received by the parent at the second cycle. At the second cycle, only the NE son and 

^ the parent are involved in sending and receiving; the other two PEs have no action. Both vmask and hma^ 
are updated to control the connection of next time step. 

At t = 1, agan four PEs form the four sons and one parent of the next-level pyramid. But these four PEs 
span in a 4x4 mesh as shown in Figure 13. The activity of four sons and the parent is the same as t-0 
except that PEs at even rows or even columns are disabled. These disabled "non-pyramid'' PEs SHORT 

^ their W to E line, and N to S line to establish the pyramid connection. 

PEs that constitute the last-level pyramid are shown In Rgure 14. Their activities are the same as the 
previous two steps. 

Orthogonal to the pyramidal structure described above, the pyramid pattern has a mesh connection at 
each level. That is for each node in the pyramid there exist four neighbors (N, S, E. W) at the same level 
^ other than four sons at the level below and one parent at the level above. Tlus relation is shown in Rgure 
15. 
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The control aJgorithm for obtaining the neighbors at the same level has been imbedded in the above 
pyramid control algorithm. For example, at t=0 as shown In Figure 8a, the neighbors at the same level are 
connected by the original mesh. At t=1. the neighbors at the same level are scattered in every other row 
and column and the mesh connection for them has been established by the above-mentioned control 
5 algorithm PYRAMID. 

To obtain the content of Its neighbors at the same level of the pyramid, two control cycles are added to 
every step of the control algorithm PYRAMID as follows. Cycle 3 Is to obtain content of N and W while 
cycle 4 is to obtain that of S and E. 



75 
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PYRAMID 0 
{ 

int t; /* t is time step*/ 

int pidO; /• column position of a PE*/ 

int pidl; /* row position*/ 

Int M« logM; /*M is tlie side size of the mesh and logMalog MV 
int hmaslcsl, vmasksl; /*two flags to construct the pyramid*/ 



for(t=0;t<IcgM;t++){ 
/♦cycle 1 action*/ 
40 if(-«hniask | -«Tmask) 

{SHORT_WE; SHORT_NS; DISABLE;} 
if(hmask vmask -.pidO<t> St& -,pidl<t>) 

45 
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E = MEM; 

if(hinask && vmask d& pidO<t> &Si -.pidl<t>) 
. {N = MEM; MEMl = W;} 
ifOimask && vmask -.pidO<t> pidl<t» 
E ^ MEM; 

ifOimask vmask && pidO<t> pldl<t>) 
{MEMO = N; MEMl = W;} 

/*cyde 2 action*/ 

if(-.hmask I -t vmask) 

{SHORT_WE: SHORT_NS; DISABLE;} 
iffhmask && vmask && -.pidO<t> && -,pidl<t>) 

NONACTION; 
ifOimask vmask pidO<t> && ^pldl<t>) 

S = MEMl; 
iffhmask && vmask. -.pidO<t> Sc& pidl<t>) 

NONACTION; 
if(hmask vmask pidO<t> pidl<t» 

MEMl = N; 

/•cycle 3 action*/ 

tf(-ihmask | -t vmask) 
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ISH0RT_WE; SHORTENS; DISABLE;} 
if (hmask && vinask){ 
S » MEM3: 
£ = MEM4: 



10 

MEM3 a N; 
MEM4 = W; 

,6 , 



2Q /•cycle 4 action*/ 

if (-hmask | -vmask) 

{SHORT_WE; SHORTENS; DISABLE;} 

25 

if (hmask vmask){ 
N » MEMS; 

30 W a MEM6; 

MEMS = S; 
MEM6 :s £; 

35 

} 
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hmask s hmask & pidO<t>; 
vmask = vmask & pidl<t>; 

I 

} 

Using the mechanism in the connection control btocK. hmasl( and vmask are loaded to F1 and F2 
respectively. The pIdO in register X and pidl in register Y are copied to SRX and SRY respectively. Shifting 
logically to the right. BSRX and BSRY will contain pidO<t> and pid1<t> at time step t These two condition 
bits along with F1 and F2 are used to implement the SHORT actions. 

The alxjve-described pyramid has a base of MxM and a shrinlcage of 2, meaning that the level above 
the base has My2 x M/2 PEs and so on. The PYRAMID control algorithm can be extended to handle any 
shrinkage K, where K is a power of 2, by updating hmask and vmask by 
hmask = hmask & pidO<t> & pidO<t+ 1> 
vmask » vmask & pid1<t> & pidKt-f 1> 
and by skipping the pyramid node actions on every odd t step. 
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(P10) Reverse Pyramid 

Information flows from bottom to top level in a pyramid for iconic to symbolic conversion. However, 
there is need for the information to flow In the opposite direction for symbolic to iconic conversion. This can 
5 be served by the Reverse Pyramid (R-Pyramid) formed from the polymorphic mesh by the following control 

algorithm. ? 



TO 



20 



R-PYRAiMID 0 
{ 

int t; /* t is time step*/ 
'® int pidO; /♦ row position of a PE*/ 

int pidl; /♦ column position of a PE*/ 

int M, logM; /*M is the side size of the mesh and logM=log M*/ 
int masksM%2;/* a Hag to construct the pyramid*/ 
int hmasky vmask; 

25 

for (tzaO: t< logM; t++) { 
^0 hmask = ANDALLBIT ( -»mask | pidO ); 

vmask = ANDALLBIT ( ^mask | pidl ); 

35 

/•cycle 1*/ 

if (-1 hmask I ^ vmask) 

40 

{SHORT_WE; SHORTENS; DISABLE;} 
iffhmask && vmask && pidO<logM-t-l> Sl& pidl<logM-t-l» 
45 N = MEM2; 

if(hmask && vmask && pidO<logM-t*l> -.pidl<logM-t-I>) 
MEM2 = S; 

50 

if(hmask && vmask -«pidO<logM*t-l> && pidl<logM-t-l>) 
NO_ACnON; 

S5 if(hmask && vmask && -.pidO<logM-t-l> -.pidl<logM-t-l>) 



24 



0 257 581 



NO-ACTION; 



5 

/•cycle 2V 

ir(-ihinask| -ivmask) 

TO 

{SHORT_^WTE; SHORTENS; DISABLE;} 
ifChmask && vmask && pidO<logM-l-l> && pidl<logM-t-l» 
w {N = MEMl; W = MEM3;} 

if(hniask && rmask && pidO<logM-t-l> && -.pidl<IogM-t-l» 

{MEMl = S; W = MEM2;} 

20 

iffhmask vmask -.pidO<IogM-t-l> pidl<IogM-t-l» 
ME>D = E; 

^ inhmask &Sl vmask &&, -,pidO<IogM-t-l> && -«pidl<logM*t-l» 

MEM2 = E; 

30 

mask = ASHIFT ( mask, 1); 

1 

35 

} 

The control algorithm for the R__PYRAMID is a reverse process of the PYRAMID control algorithm and 
^ is an expansion of the RR^TREE and RC__TREE> 

One half of the mesh size (the "mask**) is loaded to register SRM. while pidO in X and pidi in Y are 
copied to SRX and SRY respectively. The "INVERTed" SRM is "ORed" with X then "ANDALLBITed" to 
produce a flag "hmask" in XANDALL Similarly, "vmask" is produced in YANDALL Atong with these two 
condition bits. ptdO and pidi are shifted logically from left to right to produce pidO<logM-t-1> and 
^ pid1<logM-t-1> at time step t in BSRX and BSRY respectively. Rgure 16. 17 and 18 Illustrate the forming 
of a pyramid in an 8x8 polymorphic pyramkl. 

In a similar way as PYRAMID, each step of the R-PYRAMID control algorithm consists of two cycles: 
the first cycle is an intennediate stage of sending data to the NW son (data are routed to the NE son in the 
cycle and to NW son at the second cycle); and at the second cycle while the NE son is routing the data to 
^ the NW son. the parent sends data to the NE and SW sons at the same time. 

Connection for the neightx>rs at the same level of the pyramid can also t>e established by the R- 
PYRAMID control algorithm similar to PYRAMID. However, communication at the same level is not used In 
general for information flowing top-down. When It is necessary, actions similar to cycle 3 and 4 of the 
PYRAMID control algorithm can be used. 
^ The above-described pyramid has a base of MxM and a shrinkage of 2. meaning that the level above 
the base has M/2 x M/2 PEs and so on. The R-PYRAMID control algorithm can be extended to handle any 
• shrinkage K. where K is a power of 2, by initializing SRM by M/(logK) and shifting SRM tog K bits per step. 
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(P11) Cube 

Cube is the most natural extension of the polymorphic mesh into a 30 structure. The usefulness of 
cube to 3D data structure (e.g. 3D image element = volume = voxel) can be analogously described as the 
usefulness of the mesh to 2D data structure (e.g. 2D picture element = area » pixel). 

To form the cube from the mesh, the data structure in the third dimension is sliced vertically and 
allocated to one PE such that the communication in the third dimension can be accomplished by the local 
memory communication while the communication involving the other two dimensions are served by the 
mesh. In fact, the polymorphic feature is not used in fomiing this pattern and the cube formation may not 
be as novel as the other eleven patterns. Nevertheless, the cube pattern is supported by the polymorphic 
mesh with the saving of the connection pins In the third dimension. Since the saving is significant (2xMxM 
pins In total), the forming of the cube from the polymorphic mesh is important to its VLSI implementation. 

With the data slicing, and MxMxK cube can be formed, where K is an integer and the value of K is only 
limited by the amount of local memory In the PE 



(P12) DiagonahSpan Tree (DST) 

A Diagonai-Span-Tree (DST) is a binary tree whose lea^ span the diagonal of the mesh once and only 
once. By this definition, the DST in an I^N mesh has N leafs each of which occupies a diagonal node 
designated as PE(k, N-l-k). k=0 to N-1. There are many possible DSTs in a mesh. We choose the one 
shown in Rgure 19 (exemplified by a 3-level DST in an 8x8 mesh) because it is simple to control. 

As shown in Rgure 19. the root of the DST is at PE(0, 0) (upper left comer of the mesh). The left son of 
the root is four units away vertically (l*e.PE(4, 0)) while its right son is four units away horizontally Ci.e.PE(0, 
4)). 

The second level sons are two units away from the corresponding first level sons vertically and 
horizontally. Thus PE (6, 0) and PE(2, 4) are the sons of PE(0. 4). and PE(4, 2) and PE(6, 0) are the sons of 
PE(4. 0). 

Spanning in a similar way, all diagonal PEs of the mesh are the third level sons. 

In a general definition, a Upper Left DST (ULDST) in a NxN mesh has PE(0. 0) as the root and the 
diagonal PEs (k, N-1-k), k=0 to N-1. as the leafs. The i-th level fi = 1 to logN) left son of PE(s, t) Is PE- 
(s + 2~(logN - i). t) and the i-th level right son of PE(s, t) is PE(s, t + 2"(logN-l)). 

The control algorithm for ULDST is listed below. 



ULDST 0 
{ 

int fs^O, frssO; /^flag-send is used to construct the DSTV 
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/*nag-receive is aa intermediate var to update fsV 
int ptdOt pidl; 
int t; /« t is time step V 

int M, logM; /*M is tlie side size of the mesh and logM=log MV 
Jnt treemasksM%2;/* a flag to construct the tree*/ 



if(pidO»aO it& pidlasO) { 
fs z:: 1; fr » 1;} 



for (t=0; t< logM; t++) { 

hmasic » ANDALLBIT ( -treemask | pidO ); 

25 

vmask a ANDALLBIT ( -.treemask | pidl ); 
if(-ihmask | -i vmask) 
30 {SHORT_WE: SHORT_NS; DISABLE;} 

if(hmask | vmask fr) 
{E s fs; S s fs; fr = 0;} 

35 

if(hma$k | vmask &&-.fr) 
{fs = W I N; fr =: W I N; } 
^ treemask =: ASHIFT ( treemask, 1); 

} 



Using an 8x8 polymorphic mesh as an example, the ULDST algorithm can be explained as follows. A 
root PE(000, 000) Is selected for the DST and Its fe and fr are set to 1. At t=0 time step, rows 000 and 100. 
and columns 000 and 100 are active while the rest are disabled. The disabled PEs short their WE and NS 

50 to establish a temporary path for updating fs. The active RE with fr = 1 sends its fs value to E and S 
neighbors, then reset fr to 0 so that it will not be a sender at the next time step. The receivers (with frs»0) 
updates its fs as the ORed value of N and W. The receivers also update Its fr with the same ORed result 
therefore the RE just received a 1 from N or W will be the sender at the next time step. Step t=0 selects 
two PEs (PE(000, 100) and (100.000)) as nodes of DST (via setting, their fs = 1) furthermore this step 

55 prepares them as the new senders (via setting fr = l) to set more DST nodes in the following step. 

At the next step, each of the two new senders will produce, two DST nodes and two senders in a similar 
way. The new nodes and senders are PEs (000. 110), (010. 100), (100. 010) and (110, 000). 
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At t=2. the diagonal PEs are reached by the control algorithm; their fs will be set to 1 to Identify 
themselves as part of the DST. Furthermore, their fr will be set to 1. which is additional information to 
identify themselves as diagonal nodes. The diagonal identification is a bonus from the DST algorithm. It is 
useful for many types of computing to be discussed in the next section. 

To form an DST. the flag fs is used as the condition: PEs with fs^O short We and NS paths while PEs 
with fs = 1 send MEM to E and S, and receive data from W and N. 

Using the mechanism in the connection control block, "treemask" is loaded to SRM, fs to F1 and fr to 
F2. The INVERTed SRM is ORed with pidO in register X and pidi In register Y respectively; the -ORed" 
results are "ANDALLBITed" to produce "hmask* in XANDALL and "vmask" in YANDALL At the end of 
each step. SRM is shifted arithmetically one bit to the right. The SHORT action is then based on F2, 
XANDALL and YANDALL 

By choosing a different root and using a similar control algorithm, different DST can be form in the 
same polymorphic mesh. Rgure 20 shows a DST with the root in the lower right comen it is called URDST. 

In the following we show that the coexistence of ULDST and LRDST allows us to compute A"X+ B'Y+C 
in parallel for each pixel (x, y) in an Image. This capability has a very wide application^ In computer 
graphics and computer vision. 



APPUCAHONS 

Besides the well-known application of a plain mesh to image processing the following applications of 
polymorphic mesh are either faster in polymorphic mesh or not-implemented in the plain mesh. These 
applications are categorized into six types. 



(1) DMde-ancfconquer computing 

This type of computing involves in dividing a set of N data into two groups at first according to their 
property. Then by applying the same property, each group is further divided into two subgroups. This 
process is repeated until each group contains only one datum. 

For mesh connection, this type of computing has the complexity of 0{N) or higher. By transforming to 
trees and pyramids, the complexity of this type of computing is 0(log N) in the polymorphic mesh. The 
speedup for a data set of N = 1024 is 100:1, a two orders of magnitude improvement 

Computations belonging to this type include sorting, find maximum, minimum, k-th largest and median. 
All these algoritiims are of complexity 0(log N). 



(2^ /cof}/c-fo-sym6o//b Conversion 

This type of computing Is specific to computer vision and is often called intermediate level processing. 
Given an image, we are interested to know 

(a) how many pixels satisfy a specified property; 

(b) which pixels satisfy tiie specified property; 

(c) are SOME or NON or ALL pixels satisfy tfie specified property; 
The property for the at>ove can be 

(a) equal to a value; 

(b) greater than a value 

(c) smaller than a value 

(d) condition synthesized arithmetically and logically from above 

All these algorithms can be computed in O(logN) steps in polymorphic mesh by tree and pyramid 
patterns. More importantiy and In significant contrast to the conventional fixed-pattern approach, only the 
answer (as against tiie whole intermediate image) is output. This significantiy reduces the I/O rate; in the 
extreme case, only one bit (YES/NO) as against 1024x1024 bits (the whole Image) is output 

Related to I/O. extra N-S path has been traditionally added to tiie mesh to support concurrent I/O and 
processing. This mechanism and benefit are also valid for the polymorphic mesh, however, is irrelevant to 
tiie invention. 



28 



0 257 581 



(3) Statistic Measumnent 

The polymorphic mesh is capable of computing the following statistics in 0(log N) steps. The statistics 
include. 

(a) mean, variance, standard deviation; 

(b) area, perimeter and centroid 

(c) first moment, second moment and cross moment 

Item (a) is general to a set of N data while item (b) and (c) are specific to an image. 
The statistics are foundations of other algorithms. In computer vision, they are the basis for region 
analysis and pattern recognition. 



(4)Compute A V + B V + C 

To compute A'X + B'V + C, four patterns need to be formed by the polymorphic mesh : they are one 
Upper-Left-Diagonal-Span-Tree (ULDST). one Lower-Right-DST (LRDST), Row-Buses and Column-Bus- 
es.The ULDST and LRDST must coexist to compute A"X+C and B^ simuftaneously white the Row-Buses 
and the Column-Buses are coexistent to do the summing (e.g. A*X + C + B*Y). 

The algorithm is perfonned in a bit-serial manner. The extra two trees In the pixel-plane can be 
eliminated. Constant arguments A and B are broadcast to all PEs before the computing begins and are 
stored in anray A and B with (logM-1) "O's preempted at the beginning of the an'ays. The storage of A and 
B is bit-reversed so that after the preempted "0"s are accessed, the lease significant bit of A and B will be 
accessed first The constant argument C is injected into the polymorphic mesh through W of the root of the 
ULDST one bit per time step starting from the least significant bit Using the ULDST as the tree to compute 
A"X + C. each PE has three variables (sum, carry and delay). At each time step, "sum" is passed Eastbound 
and "delay" is passed Southbound while each PE perfonms two operations (a) add N with array A. store the 
can7 bit In "canry" and (b) store N In "delay". After logM steps, the diagonal PEs of the mesh (or the leafs 
of ULDST) stores A"X for the conresponding row. SImllarty. the computing of B^ can be done by LRDST 
with "0" Injected from E of the root and with each PE passes "delay" Northbound and "sum" Westbound. 
After logM steps, the diagonal elements store B*Y for each corresponding column. 

After obtaining A"X+C in rows and B'Y in columns, the polymorphic mesh changes to Row-buses In 
WE path and Column-buses In NS path. Each PE then adds the value on Row-bus to value on Column-bus 
to produce AOC+B-Y+C in bit serial fashion. 

Since there is a conflict of resource in establishing the DSTs and the buses simultaneously, the result 
of A'X -f B^ + C is delivered in bit serial at every other time step. 



(5) Fast Line Detection 

With the capability of computing A"X+B*Y+C in every two time steps of the polymorphic mesh, we can 
have every pixel pC Y) in an MxM image to decide whether it is on a given line detemiined by A. B and C. 
Assume that all numbers are K bits long, the decision can be made in logM +2K time steps. 

The capability of fast line detection is very useful in computer graphics and computer vision. For 
computer graphics, it is useful in displaying convex polygons, in creating shadow, in clipping, in drawing 
spheres, in computing adaptive histogram equalization, In texture mapping and anti-aliasing. For computer 
vision, such a capability is useful In computing Fast Hough Transfonn for detecting lines in a noisy image. 



(6) Converting Symt)oiic information to iconic information 

With the massive parallel hardware available in the polymorphic mesh, it is advantageous to convert a 
symbolic processing (usually not done in mesh) into iconic processing so that the processing can be done 
In a massively parallel way. The Fast Hough Transfonn mentioned above Is one such example. The "mask 
generating" to be descrit}ed Is another class applications in tiiis category. 
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(6.1) Band Mask Generation 

A "band mask" Is confined within two parallel lines, one of which is determined by (A. B. CI) and the 
other by (A. B. C2). To generate a "band mask", each PE computes A*X+B^+C1 and A*X+B*Y+C1 +- 
5 (C2-C1) as described above. The computing produces SI (the sign of A'X+BT+CI) and S2 (the sign of 
A-X* B^+C2), both of which are used in deciding whether pixel PC Y) is inside the band. 

The "band mask" provides a capability to a computer vision system in processing only the region of 
interest. The human vision adopts a different strategy in generating masks. The "strategy" is a symbolic 
Information but its processing is actually done iconically as described. 

10 

(6.2) Pofygonaf Mask Generatton 

The "polygonal mask" Is a generalization of the band mask, ft consists of the unton of P half planes, 
15 each of which is determined by a line specified by A'X+B'Y+C. Using the line detection capability, we can 
obtain signs SI, S2 to SP for the conresponding lines. The Boolean combination of Si to Sp determines a 
pixel (X. Y) is inside the polygon. 



20 CONCLUSiONS 

The prefenred embodiment of the invention carries out the following transforms under control of the 
connection control mechanism: 

The physical MxM mesh connectk)n to to one row and one column linear arrays, each MxM long. 
25 The physical MxM mesh connection to M row trees, each of which has M leaves. 
The physical MxM mesh connection to M column trees, each of which has M leaves. 
The physical MxM mesh connection to a MxM orthogonal tree. 

The physical MxM mesh connection to M reverse row trees, each of which has M leaves. 
The physical MxM mesh connection to M reverse column trees, each of which has M leaves. 
30 The physical MxM mssh connection to M row buses, each of which has M PEs. 
The physical MxM mesh connection to M column buses, each of which has M PEs, 
The physical MxM mesh connection to a pyramid with MxM base and a shrinkage K where K is the power 
of 2. 

The physical MxM mesh connection to a reverse pyramid with MxM base and a shrinkage K where K is the 
35 power of 2. 

The physical MxM mesh connection to a MxMxK cube, where K is an Integer and is only limited by the 
k)cal memory of PE. 

The invention carries out the following transfonm under control of jDrogramming outside the connection 
control mechanism: 

40 The physical MxM mesh connection to a DST tree (whose root can be at any comer of the mesh). Up to 
two DSTs can coexist if tiieir roots are at the opposite comers of a diagonal. 

The physical MxM mesh connection to a MxM ortiiogonal tree in 0(M*2) silicon area. A saving at a factor of 
0((log M)*^) is obtained by tiie invention, where M is the side size of tfie ortiiogonal tree. 
The class of divide-and-conquer algorithms of linear. 0(M). complexity into logaritiim, O0og M), complexity. 
45 A saving of M/!ogM is obtained by ti^e invention. Such class of algorltiims Is discussed in tiie description of 
the preferred embodiment 

The iconic-to-symbolic conversion (intermediate level processing) occurs within tiie polymorphic mesh such 

that the I/O is signlficantiy reduced. In an extreme case as discussed in the description of the prefenred 

embodiment of the invention, a reduction of six orders of magnitude is obtained. 
50 Symbolic representation can be transformed into iconic representation by the above pattems such tiiat the 

processing can be performed iconically in massive parallelism available in mesh. Such a feature expands 

tiie capability of mesh into the domain of symbolic processing. 

The mesh system is capable of: 

(a) performing iconic processing, 
55 (b) converting iconic information to symbolic information, 

(c) converting symbolic information to iconic information and 

(d) performing symbolic processing in its iconic equivalence. 
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The Invention allows the computing of AV+B\+C In an MxM nnesh In 0(log M) steps for every pixel 
(x. y) in parallel where A. B and C are constant integers. 

The invention allows the detection for every pixel (x. y) in an MxM Image whether the pixel (x. y) is (a) 
on the line (b) to the right of a line or (c) to the left of a line In 0(log M) step. 
5 The invention allows the parallel detection for every pixel (x, y) of an MxM image whether the pixel is 
inside or outside of a band where the band is formed by two parallel lines. 

The Invention allows the parallel detection for every pixel (x. y) of an MxM image whether the pb(el Is 
Inside or outside of a polygon. 

The Invention can be generalized for 3D mesh (physical cube) to fonn the higher-dimension-extension 
10 of the twelve patterns by adding a register Z, a shift register SRZ and a flag F3 to the connection control 
mechanism, (e.g. The higher-dimension extension of a cube is a 40 hypercube.) 

The 3D intage element is the voxel. The voxel Is the volume element analogous to the 2D pixel, which 
has area only. The 3D extension of the invention allows the parallel detection for each voxel (x, y, z) 
whether the voxel is inside or outside of 
75 (a) a region formed by two parallel planes. 

(b) a polyhedron, or 

(c) whether the voxel Is on, to the left or to the right of a plane. 

The Invention applies the concept of "polymorphic" to a physical mesh via a •* connection control 
mechanism." The same concept and mechanism can be generalized to other physicaily-fixed-connections. 

20 The pattern formed by the polymorphic mesh can be adaptive to the nature of the data by loading the 
F1 and/or F2 registers via the output of ALU. 

Arbitrary patterns can be formed by the polymorphic mesh by setting F1 and F2 registers via 
instruction or memory. The instruction, the memory value, Intemnediate processing values from a neighbor- 
ing processing element and diagnostic information within the processing element are representative system 

as operation parameters which may be used to set the flag registers, by well known techniques and simple 
means not shown. The connection control mechanism thus comprises flag register means, which flag 
register means is settable as a function of system operation parameters, to provide control information 
usable in setting a new pattern into at least one of tfie pattern registers. This capability, accessible to ttie 
programmer, permits ttie programmer to set up adaptation to data-related and condition-related future event 

30 possibilities. Upon occun'ence of such an event, a flag register Is set and a new pattern fetched or 
calculated in response. 

Thus, while tiie Invention has been described with reference to a prefenred embodiment it will be 
understood by those skilled in the art tiiat various changes in form and details may be made without 
departing from the scope of the invention. 

35 

Claims 

1. An optimizaWe reconfigurable array processing system for performing under programmable control a 
40 series of programmably defined tasks upon input images, having differing optimal system configurations as 

a composite function of task definition and image input, so that at differing times under differing conditions 
there are identifiable differing optimal configurations, comprising: 

system control means (3) including operator input means, image Input means and operation control means, 
an anray of polymorphic mesh processing elements (2), each comprising: 
4S a memory (7); 

an ALU (6) connected to said menr>ory; and 

connection control mechanism (8), with a finite number of simple connection patiis to said ALU (6) and to a 
related subset of said anray of polymorphic mesh processing elements (2), with means (10) to fonn 
selective internal and external interconnections of the polymorphic mesh processing element accordlrig to a 
50 connection control pattern, and means (21,22.23) to provide a connection control pattern. 

2. An optimizable reconfigurable array processing system according to Claim 1, wherein said connec- 
tion control mechanism comprises pattern presentation means (21,22) making available to tiie processing 
element a surplus of pattern bit values defining a standard set of pattern values and an alternate set of 
pattern values, and pattern value selection means (23) for selecting said startdard set of pattern values or. 

55 alternatively, for selecting said alternate set of pattern values, 

and said connection control mechanism also comprises switching means (10) responsive to the selected 
pattern bit values. 
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3. An optimizable reconfigurable array processing system according to Claim 2, 
wherein said connection control mechanism (8) comprises a plurality of pattern registers (21,22). Including a 
standard pattern register and an alternate pattem register, and a crossbar switch (10). with external 
connections to neighboring processing elements, with internal connections to said plurality of pattern 
5 registers and to said ALU (6), and with control connections to said plurality of pattem registers; and 

wherein said pattem presentation means comprises pattem register selection means (23), for selecting one 
pattem register; 

and said means controlling said connection control mechanism In accordance with an optimization pattem 
Includes gate switching means (16-20) responsive to the setting in the selected pattem register. 
70 4. An optimlzable reconfigurable array processing system according to Claim 3, wherein said connec- 
tion control mechanism comprises two pattem registers (21,22), and said pattem register selection means 
for selecting one pattem register is a binary device (23): 

5. An optimizable reconfigurable an'ay processing system according to Claim 3, wherein said connec- 
tion control mechanism comprises flag register means (31.32), said flag register means being settable as a 

IS function of system operation parameters, to provide control information usable in setting a new pattern Into 
at least one of said pattem registers. 

6. A dynamically optimizable reconfigurable array processing system comprising system control means 
including operator input means, image input means, operation control means and operation monitoring 
means controlling overall system operation, simultaneously monitoring so as to determine optimal system 

20 configuration as a composite function of operator input means and operation monitoring means, providing a 
signal defining an optimal configuration selection; and 

an array of polymorphic mesh processing elements, each having a memory and an ALU. and each having a 
finite number of connections paths to related polymorphic mesh processing elements, 
each having a polymorphic mesh connection control block having capability to form selective interconnec- 
25 tion of the polymorphic mesh processing element by short-circuiting selected connection paths, and by 
selecting a logical connective, and 

means connecting said optimizing reconfiguration control signal to said polymorphic mesh connection 
control block. 

7. A processing element for a dynamically optimizable reconfigurable array processing system compris- 
30 ing a multiplicity of processing elements generally controlled by a host computer to carry out image 

processing as a networic, comprising: 
memory means; 
processing means; 
I/O connection means; 
35 intemal connection means; 

connection control means controlling the relationships of said other means in accordance with a control 
pattem; and 

means to alter the control pattem in sad connection control means. 

8. A processing element for a dynamically optimizable reconfigurable array processing system accord- 
40 Ing to Claim 7, wherein said means to alter the control pattem in said connection control means is means 

responsive to Intermediate level processing in a related processing element. 

9. A processing element for a dynamically optimizable reconfigurable array processing system accord- 
ing to Claim 7, wherein said means to alter the control pattem in said connection control means is means 
responsive to fault indication in a related processing element 

<5 10. A processing element for a dynamically optimizable reconfigurable anray processing system 
according to Claim 7, comprising In addition external memory data connection means (EMD 102) 
connecting to said memory means, processor means and connection control means (8). 

11. A processing elernent for a dynamically optimizable reconfigurable array processing system 
according to Claim 10, comprising in addition a multiplexer and a plurafrty of bus connections for said 

50 extemal memory data connection to said memory means, said processing means and said connection 
control mechanism. 

whereby said processing element may be operating local memory and external memory simultaneously. 
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